DAFx Paper Archive - Browse all papers by Sonnleitner, R.

Unsupervised Feature Learning for Speech and Music Detection in Radio Broadcasts

DAFx-2012 - York

Detecting speech and music is an elementary step in extracting information from radio broadcasts. Existing solutions either rely on general-purpose audio features, or build on features specifically engineered for the task. Interpreting spectrograms as images, we can apply unsupervised feature learning methods from computer vision instead. In this work, we show that features learned by a mean-covariance Restricted Boltzmann Machine partly resemble engineered features, but outperform three hand-crafted feature sets in speech and music detection on a large corpus of radio recordings. Our results demonstrate that unsupervised learning is a powerful alternative to knowledge engineering.

Download

A Simple and Effective Spectral Feature for Speech Detection in Mixed Audio Signals

Reinhard Sonnleitner; Bernhard Niedermayer; Gerhard Widmer; Jan Schlüter

DAFx-2012 - York

We present a simple and intuitive spectral feature for detecting the presence of spoken speech in mixed (speech, music, arbitrary sounds and noises) audio signals. The feature is based on some simple observations about the appearance, in signals that contain speech, of harmonics with characteristic trajectories. Experiments with some 70 hours of radio broadcasts in five different languages demonstrate that the feature is very effective in detecting and delineating segments that contain speech, and that it also seems to be quite general and robust w.r.t. different languages.

Download

Quad-Based Audio Fingerprinting Robust to Time and Frequency Scaling

Reinhard Sonnleitner; Gerhard Widmer

DAFx-2014 - Erlangen

We propose a new audio fingerprinting method that adapts findings from the field of blind astrometry to define simple, efficiently representable characteristic feature combinations called quads. Based on these, an audio identification algorithm is described that is robust to large amounts of noise and speed, tempo and pitch-shifting distortions. In addition to reliably identifying audio queries that are modified in this way, it also accurately estimates the scaling factors of the applied time/frequency distortions. We experimentally evaluate the performance of the method for a diverse set of noise, speed and tempo modifications, and identify a number of advantages of the new method over a recently published distortioninvariant audio copy detection algorithm.

Download

Years

Authors